On training targets for deep learning approaches to clean speech magnitude spectrum estimation
نویسندگان
چکیده
Estimation of the clean speech short-time magnitude spectrum (MS) is key for enhancement and separation. Moreover, an automatic recognition (ASR) system that employs a front-end relies on MS estimation to remain robust. Training targets deep learning approaches fall into three categories: computational auditory scene analysis (CASA), MS, minimum mean square error (MMSE) estimator training targets. The choice target can have significant impact enhancement/separation robust ASR performance. Motivated by this, produces enhanced/separated at highest quality intelligibility which best found. Three different neural network (DNN) types two datasets, include real-world nonstationary coloured noise sources multiple signal-to-noise ratio (SNR) levels, were used evaluation. Ten objective measures employed, including word rate Deep Speech system. It found estimate priori SNR MMSE estimators produce scores. it established gain ideal amplitude mask scores are most suitable front-end.
منابع مشابه
Deep learning approaches to problems in speech recognition ,
Deep learning approaches to problems in speech recognition, computational chemistry, and natural language text processing George Edward Dahl Doctor of Philosophy Graduate Department of Computer Science University of Toronto 2015 The deep learning approach to machine learning emphasizes high-capacity, scalable models that learn distributed representations of their input. This dissertation demons...
متن کاملEnd-to-End Deep Learning Framework for Speech Paralinguistics Detection Based on Perception Aware Spectrum
In this paper, we propose an end-to-end deep learning framework to detect speech paralinguistics using perception aware spectrum as input. Existing studies show that speech under cold has distinct variations of energy distribution on low frequency components compared with the speech under ‘healthy’ condition. This motivates us to use perception aware spectrum as the input to an end-to-end learn...
متن کاملA framework for estimation of clean speech b speech enhancemen
A novel multiple-input Kalman filtering (MIKF) framework is presented that estimates the clean speech signal by fusion of outputs from multiple speech enhancement systems. The MIKF framework generates a sample-by-sample minimum mean-square error estimate of the clean speech signal from these outputs. The residual noise in each input to the MIKF is modeled as an autoregressive (AR) process so th...
متن کاملExperiments on deep learning for speech denoising
In this paper we present some experiments using a deep learning model for speech denoising. We propose a very lightweight procedure that can predict clean speech spectra when presented with noisy speech inputs, and we show how various parameter choices impact the quality of the denoised signal. Through our experiments we conclude that such a structure can perform better than some comparable sin...
متن کاملClean speech feature estimation based on soft spectral masking
In this paper, we first analyze the problems of speech and noise contamination process in noise-masking point of view, and propose a new approach to estimate degree of noise masking effect on clean speech distribution model based on sequential noise estimation. Sequential noise estimation is performed frame-by-frame using interacting multiple model (IMM) algorithm, so that realtime implementati...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of the Acoustical Society of America
سال: 2021
ISSN: ['0001-4966', '1520-9024', '1520-8524']
DOI: https://doi.org/10.1121/10.0004823